{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# An Introduction to the Amazon Fraud Detector API  \n",
    "#### Supervised fraud detection  \n",
    "-------\n",
    "- [Introduction](#Introduction)\n",
    "- [Setup](#Setup)\n",
    "- [Plan](#Plan)\n",
    "\n",
    "\n",
    "## Introduction\n",
    "-------\n",
    "\n",
    "Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities such as online payment fraud and the creation of fake accounts. Fraud Detector capitalizes on the latest advances in machine learning (ML) and 20 years of fraud detection expertise from AWS and Amazon.com to automatically identify potentially fraudulent activity so you can catch more fraud faster.\n",
    "\n",
    "In this notebook, we'll use the Amazon Fraud Detector API to define an entity and event of interest and use CSV data stored in S3 to train a model. Next, we'll derive some rules and create a \"detector\" by combining our entity, event, model, and rules into a single endpoint. Finally, we'll apply the detector to a sample of our data to identify potentially fraudulent events.\n",
    "\n",
    "After running this notebook you should be able to:\n",
    "- Define an Entity and Event\n",
    "- Create a Detector\n",
    "- Train a Machine Learning (ML) Model\n",
    "- Author Rules to identify potential fraud based on the model's score\n",
    "- Apply the Detector's \"predict\" function, to generate a model score and rule outcomes on data\n",
    "\n",
    "If you would like to know more, please check out [Fraud Detector's Documentation](https://docs.aws.amazon.com/frauddetector/). \n",
    "\n",
    "\n",
    "## Setup\n",
    "------\n",
    "First setup your AWS credentials so that Fraud Detector can store and access training data and supporting detector artifacts in S3.\n",
    "\n",
    "\n",
    "### Setting up AWS Credentials & Permissions\n",
    "\n",
    "https://docs.aws.amazon.com/frauddetector/latest/ug/set-up.html\n",
    "\n",
    "To use Amazon Fraud Detector, you have to set up permissions that allow access to the Amazon Fraud Detector console and API operations. You also have to allow Amazon Fraud Detector to perform tasks on your behalf and to access resources that you own. We recommend creating an AWS Identify and Access Management (IAM) user with access restricted to Amazon Fraud Detector operations and required permissions. You can add other permissions as needed.\n",
    "The following policies provide the required permission to use Amazon Fraud Detector:\n",
    "\n",
    "*AmazonFraudDetectorFullAccessPolicy* \n",
    "- Allows you to perform the following actions:\n",
    "    - Access all Amazon Fraud Detector resources  \n",
    "    - List and describe all model endpoints in Amazon SageMaker  \n",
    "    - List all IAM roles in the account  \n",
    "    - List all Amazon S3 buckets  \n",
    "    - Allow IAM Pass Role to pass a role to Amazon Fraud Detector  \n",
    "\n",
    "* AmazonS3FullAccess* \n",
    "- Allows full access to Amazon S3. This is required to upload training files to S3.\n",
    "\n",
    "  \n",
    "\n",
    "To use Amazon Fraud Detector, you have to set up permissions that allow access to the Amazon Fraud Detector console and API operations. You also have to allow Amazon Fraud Detector to perform tasks on your behalf and to access resources that you own. We recommend creating an AWS Identify and Access Management (IAM) user with access restricted to Amazon Fraud Detector operations and required permissions. You can add other permissions as needed.\n",
    "\n",
    "The following policies provide the required permission to use Amazon Fraud Detector:\n",
    "\n",
    "- *AmazonFraudDetectorFullAccessPolicy*  \n",
    "    Allows you to perform the following actions:  \n",
    "        - Access all Amazon Fraud Detector resources  \n",
    "        - List and describe all model endpoints in Amazon SageMaker  \n",
    "        - List all IAM roles in the account  \n",
    "        - List all Amazon S3 buckets  \n",
    "        - Allow IAM Pass Role to pass a role to Amazon Fraud Detector  \n",
    "\n",
    "- *AmazonS3FullAccess*  \n",
    "    Allows full access to Amazon S3. This is required to upload training files to S3.  \n",
    "\n",
    "\n",
    "\n",
    "## Plan\n",
    "### Plan a Fraud Detector\n",
    "------\n",
    "A Detector contains the event, model(s) and rule(s) detection logic for a particular type of fraud that you want to detect. We'll use the following 7 step process to plan a Fraud Detector:  \n",
    "\n",
    "1.\tSetup your notebook\n",
    "    - Name the major components entity, entity type, model, detector\n",
    "    - Plug in your ARN role\n",
    "    - Plug in your S3 Bucket and CSV File\n",
    "2.\tRead and Profile your Data\n",
    "    - This will give you an idea of what your dataset contains\n",
    "    - This will also identify the variables and labels that will need to be created to define your event\n",
    "3.\tCreate event variables and labels\n",
    "    - This will create the variables and labels in fraud detector\n",
    "4.\tDefine your Entity and Event Type\n",
    "    - What is the activity that you are detecting? That's likely your Event Type (e.g., account_registration)\n",
    "    - Who is performing this activity? That's likely your Entity (e.g., customer)\n",
    "5.\tCreate and Train your Model\n",
    "    - Model training takes anywhere from 45-60 minutes, once complete you need to promote your model\n",
    "    - Promote your model\n",
    "6.\tCreate Detector, generate Rules and assemble your Detector\n",
    "    - Create your detector\n",
    "    - Create rules based on your model scores\n",
    "        - Define outcomes (e.g., fraud, investigate and approve)\n",
    "    - Assemble your detector by adding your model and rules to it\n",
    "7.\tTest your Detector\n",
    "    - Interactively call predict on a handful of records\n",
    "\n",
    "\n",
    "A *Detector* contains the event, model(s) and rule(s) detection logic for a particular type of fraud that you want to detect. We'll use the following 7 step process to plan a Fraud Detector: \n",
    "\n",
    "1. Setup your notebook\n",
    "    - name the major components entity, entity type, model, detector .\n",
    "    - plug in your ARN role\n",
    "    - plug in your S3 Bucket and CSV File\n",
    "\n",
    "2. Read and Profile your Data. \n",
    "    - this will give you an idea of what your dataset contains. \n",
    "    - this will also identify the variables and labels that will need to be created to define your event. \n",
    " \n",
    "3. Create event variables and labels\n",
    "    - this will create the variables and labels in fraud detector \n",
    "    \n",
    "4. Define your Entity and Event Type \n",
    "    - What is activity that you are detecting? that's likely your Event Type ex. account_registration\n",
    "    - Who is performing this activity? that's likely your Entity ex. customer \n",
    "    \n",
    "5. Create and Train your Model   \n",
    "    - model training takes anywhere from 45-60 minutes, once complete you need to promote your endpoint  \n",
    "    - promote your model\n",
    "    \n",
    "6. Create Detector, generate Rules and assemble your Detector  \n",
    "    - create your detector \n",
    "    - create rules based on your model scores \n",
    "        - define outcomes ex:  fraud, investigate and approve \n",
    "    - assemble your detector \n",
    "        - combines rules and model into a \"detector\n",
    "    \n",
    "7. Test your Detector \n",
    "    - Interactively call predict on a handful of record \n",
    "     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.container { width:90% }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "from IPython.display import clear_output\n",
    "display(HTML(\"<style>.container { width:90% }</style>\"))\n",
    "# ------------------------------------------------------------------\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "pd.set_option('display.max_rows', 500)\n",
    "pd.set_option('display.max_columns', 500)\n",
    "pd.set_option('display.width', 1000)\n",
    "\n",
    "import os\n",
    "import sys\n",
    "import time\n",
    "import json\n",
    "import uuid \n",
    "from datetime import datetime\n",
    "\n",
    "# -- AWS stuff -- \n",
    "import boto3\n",
    "import sagemaker\n",
    "\n",
    "# -- sklearn --\n",
    "from sklearn.metrics import roc_curve, roc_auc_score, auc, roc_auc_score\n",
    "%matplotlib inline "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "\n",
    "sagemaker_session = sagemaker.Session()\n",
    "role = sagemaker.get_execution_role()\n",
    "bucket = sagemaker_session.default_bucket()\n",
    "region = boto3.Session().region_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- initialize the AFD client \n",
    "client = boto3.client('frauddetector')\n",
    "\n",
    "# -- suffix is appended to detector and model name for uniqueness  \n",
    "sufx   = datetime.now().strftime(\"%Y%m%d\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Setup \n",
    "-----\n",
    "\n",
    "***To get started ***  \n",
    "\n",
    "1. Name the major components of Fraud Detector.\n",
    "2. Plug in your ARN role \n",
    "3. Plug in your S3 Bucket and CSV File \n",
    "\n",
    "Then you can interactively exeucte the code cells in the notebook, no need to change anything unless you want to. \n",
    "\n",
    "\n",
    "<div class=\"alert alert-info\"> <strong> Fraud Detector Components </strong>\n",
    "Fraud Detector Components:  EVENT_TYPE is a business activity that you want evaluated for fraud risk. ENTITY_TYPE represents the \"what or who\" that is performing the event you want to evaluate. MODEL_NAME is the name of your supervised machine learning model that Fraud Detector trains on your behalf. DETECTOR_NAME is the name of the detector that contains the detection logic (model and rules) that you apply to events that you want to evaluate for fraud.\n",
    "\n",
    "</div>\n",
    "\n",
    "\n",
    "-----\n",
    "\n",
    "### Bucket, File, and ARN Role\n",
    "\n",
    "Bucket, ARN and Model Name Identify the following assets. S3_BUCKET is the name of the bucket where your file lives. S3_FILE is the URL to your s3 file. ARN_ROLE is the data access role \"ARN\" for the training data source.\n",
    "\n",
    "\n",
    "\n",
    "<div class=\"alert alert-info\"><strong> Bucket, ARN and Model Name </strong>\n",
    "\n",
    "Identify the following assets. S3_BUCKET is the name of the bucket where your file lives. S3_FILE is the URL to your s3 file. ARN_ROLE is the data access role \"ARN\" for the training data source.\n",
    "\n",
    "</div>\n",
    "\n",
    "```\n",
    "Note: To use Amazon Fraud Detector, you have to set up permissions that allow access to the Amazon Fraud Detector console and API operations. You also have to allow Amazon Fraud Detector to perform tasks on your behalf and to access resources that you own. We recommend creating an AWS Identify and Access Management (IAM) user with access restricted to. Amazon Fraud Detector operations and required permissions. You can add other permissions as needed. See \"Create an IAM User and Assign Required Permissions\" in the user's guide:\n",
    "```\n",
    "https://docs.aws.amazon.com/frauddetector/latest/ug/frauddetector.pdf\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- This is all you need to fill out. Once complete simply interactively run each code cell. --  \n",
    "\n",
    "ENTITY_TYPE    = \"fraud_detector_entity_type{0}\".format(sufx) \n",
    "ENTITY_DESC    = \"fraud_detector_entity_description: {0}\".format(sufx) \n",
    "\n",
    "EVENT_TYPE     = \"fraud_detector_event_type{0}\".format(sufx) \n",
    "EVENT_DESC     = \"fraud_detector_event_description: {0}\".format(sufx) \n",
    "\n",
    "MODEL_NAME     = \"fraud_detector_model_name{0}\".format(sufx) \n",
    "MODEL_DESC     = \"fraud_detector_model_description: {0}\".format(sufx) \n",
    "\n",
    "DETECTOR_NAME  = \"fraud_detector_name{0}\".format(sufx)                        \n",
    "DETECTOR_DESC  = \"detects synthetic fraud events created: {0}\".format(sufx) \n",
    "\n",
    "ARN_ROLE       = role\n",
    "S3_BUCKET      = \"poc-in-a-box\"\n",
    "S3_FILE        = \"project_1_newaccounts_100k.csv\"\n",
    "S3_FILE_LOC    = \"s3://{0}/{1}\".format(S3_BUCKET,S3_FILE)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Profile Your Dataset \n",
    "-----\n",
    "\n",
    "    \n",
    "<div class=\"alert alert-info\"> 💡 <strong> Profiling </strong>\n",
    "\n",
    "The function below will: 1. profile your data, creating descriptive statistics, 2. perform basic data quality checks (nulls, unique variables, etc.), and 3. return summary statistics and the EVENT and MODEL schemas used to define your EVENT_TYPE and TRAIN your MODEL.\n",
    "\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "ename": "ClientError",
     "evalue": "An error occurred (AccessDenied) when calling the GetObject operation: Access Denied",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mClientError\u001b[0m                               Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-7-2e3f75a9793a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     81\u001b[0m \u001b[0ms3\u001b[0m   \u001b[0;34m=\u001b[0m \u001b[0mboto3\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresource\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m's3'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     82\u001b[0m \u001b[0mobj\u001b[0m  \u001b[0;34m=\u001b[0m \u001b[0ms3\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mObject\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mS3_BUCKET\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mS3_FILE\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 83\u001b[0;31m \u001b[0mbody\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Body'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     84\u001b[0m \u001b[0mdf\u001b[0m   \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbody\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     85\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/boto3/resources/factory.py\u001b[0m in \u001b[0;36mdo_action\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    518\u001b[0m             \u001b[0;31m# instance via ``self``.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    519\u001b[0m             \u001b[0;32mdef\u001b[0m \u001b[0mdo_action\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 520\u001b[0;31m                 \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maction\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    521\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    522\u001b[0m                 \u001b[0;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'load'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/boto3/resources/action.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, parent, *args, **kwargs)\u001b[0m\n\u001b[1;32m     81\u001b[0m                     operation_name, params)\n\u001b[1;32m     82\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 83\u001b[0;31m         \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparent\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmeta\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclient\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     84\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     85\u001b[0m         \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Response: %r'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_api_call\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    355\u001b[0m                     \"%s() only accepts keyword arguments.\" % py_operation_name)\n\u001b[1;32m    356\u001b[0m             \u001b[0;31m# The \"self\" in this scope is referring to the BaseClient.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 357\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_api_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moperation_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    358\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    359\u001b[0m         \u001b[0m_api_call\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpy_operation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_make_api_call\u001b[0;34m(self, operation_name, api_params)\u001b[0m\n\u001b[1;32m    674\u001b[0m             \u001b[0merror_code\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Error\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Code\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    675\u001b[0m             \u001b[0merror_class\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexceptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_code\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merror_code\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 676\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0merror_class\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparsed_response\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    677\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    678\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mClientError\u001b[0m: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied"
     ]
    }
   ],
   "source": [
    "# --- no changes; just run this code block ---\n",
    "def summary_stats(df):\n",
    "    \"\"\" Generate summary statistics for a panda's data frame \n",
    "        Args:\n",
    "            df (DataFrame): panda's dataframe to create summary statistics for.\n",
    "        Returns:\n",
    "            DataFrame of summary statistics, training data schema, event variables and event lables \n",
    "    \"\"\"\n",
    "    df = df.copy()\n",
    "    rowcnt = len(df)\n",
    "    df['EVENT_LABEL'] = df['EVENT_LABEL'].astype('str', errors='ignore')\n",
    "    df_s1  = df.agg(['count', 'nunique']).transpose().reset_index().rename(columns={\"index\":\"feature_name\"})\n",
    "    df_s1[\"null\"] = (rowcnt - df_s1[\"count\"]).astype('int64')\n",
    "    df_s1[\"not_null\"] = rowcnt - df_s1[\"null\"]\n",
    "    df_s1[\"null_pct\"] = df_s1[\"null\"] / rowcnt\n",
    "    df_s1[\"nunique_pct\"] = df_s1['nunique']/ rowcnt\n",
    "    dt = pd.DataFrame(df.dtypes).reset_index().rename(columns={\"index\":\"feature_name\", 0:\"dtype\"})\n",
    "    df_stats = pd.merge(dt, df_s1, on='feature_name', how='inner').round(4)\n",
    "    df_stats['nunique'] = df_stats['nunique'].astype('int64')\n",
    "    df_stats['count'] = df_stats['count'].astype('int64')\n",
    "    \n",
    "    # -- variable type mapper --  \n",
    "    df_stats['feature_type'] = \"UNKOWN\"\n",
    "    df_stats.loc[df_stats[\"dtype\"] == object, 'feature_type'] = \"CATEGORY\"\n",
    "    df_stats.loc[(df_stats[\"dtype\"] == \"int64\") | (df_stats[\"dtype\"] == \"float64\"), 'feature_type'] = \"NUMERIC\"\n",
    "    df_stats.loc[df_stats[\"feature_name\"].str.contains(\"ipaddress|ip_address|ipaddr\"), 'feature_type'] = \"IP_ADDRESS\"\n",
    "    df_stats.loc[df_stats[\"feature_name\"].str.contains(\"email|email_address|emailaddr\"), 'feature_type'] = \"EMAIL_ADDRESS\"\n",
    "    df_stats.loc[df_stats[\"feature_name\"] == \"EVENT_LABEL\", 'feature_type'] = \"TARGET\"\n",
    "    df_stats.loc[df_stats[\"feature_name\"] == \"EVENT_TIMESTAMP\", 'feature_type'] = \"EVENT_TIMESTAMP\"\n",
    "    \n",
    "    # -- variable warnings -- \n",
    "    df_stats['feature_warning'] = \"NO WARNING\"\n",
    "    df_stats.loc[(df_stats[\"nunique\"] != 2) & (df_stats[\"feature_name\"] == \"EVENT_LABEL\"),'feature_warning' ] = \"LABEL WARNING, NON-BINARY EVENT LABEL\"\n",
    "    df_stats.loc[(df_stats[\"nunique_pct\"] > 0.9) & (df_stats['feature_type'] == \"CATEGORY\") ,'feature_warning' ] = \"EXCLUDE, GT 90% UNIQUE\"\n",
    "    df_stats.loc[(df_stats[\"null_pct\"] > 0.2) & (df_stats[\"null_pct\"] <= 0.5), 'feature_warning' ] = \"NULL WARNING, GT 20% MISSING\"\n",
    "    df_stats.loc[df_stats[\"null_pct\"] > 0.5,'feature_warning' ] = \"EXCLUDE, GT 50% MISSING\"\n",
    "    df_stats.loc[((df_stats['dtype'] == \"int64\" ) | (df_stats['dtype'] == \"float64\" ) ) & (df_stats['nunique'] < 0.2), 'feature_warning' ] = \"LIKELY CATEGORICAL, NUMERIC w. LOW CARDINALITY\"\n",
    "   \n",
    "    # -- target check -- \n",
    "    exclude_fields  = df_stats.loc[(df_stats['feature_warning'] != 'NO WARNING')]['feature_name'].to_list()\n",
    "    event_variables = df_stats.loc[(~df_stats['feature_name'].isin(['EVENT_LABEL', 'EVENT_TIMESTAMP']))]['feature_name'].to_list()\n",
    "    event_labels    = df[\"EVENT_LABEL\"].unique().tolist()\n",
    "    \n",
    "    trainingDataSchema = {\n",
    "        'modelVariables' : df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS', 'CATEGORY', 'NUMERIC' ]))]['feature_name'].to_list(),\n",
    "        'labelSchema'    : {\n",
    "            'labelMapper' : {\n",
    "                'FRAUD' : [df[\"EVENT_LABEL\"].value_counts().idxmin()],\n",
    "                'LEGIT' : [df[\"EVENT_LABEL\"].value_counts().idxmax()]\n",
    "            }\n",
    "        }\n",
    "    }\n",
    "    \n",
    "    \n",
    "    model_variables = df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS', 'CATEGORY', 'NUMERIC' ]))]['feature_name'].to_list()\n",
    "   \n",
    "    \n",
    "    # -- label schema -- \n",
    "    label_map = {\n",
    "        'FRAUD' : [df[\"EVENT_LABEL\"].value_counts().idxmin()],\n",
    "        'LEGIT' : [df[\"EVENT_LABEL\"].value_counts().idxmax()]\n",
    "    }\n",
    "    \n",
    "    \n",
    "    print(\"--- summary stats ---\")\n",
    "    print(df_stats)\n",
    "    print(\"\\n\")\n",
    "    print(\"--- event variables ---\")\n",
    "    print(event_variables)\n",
    "    print(\"\\n\")\n",
    "    print(\"--- event labels ---\")\n",
    "    print(event_labels)\n",
    "    print(\"\\n\")\n",
    "    print(\"--- training data schema ---\")\n",
    "    print(trainingDataSchema)\n",
    "    print(\"\\n\")\n",
    "    \n",
    "    return df_stats, trainingDataSchema, event_variables, event_labels\n",
    "\n",
    "# -- connect to S3, snag file, and convert to a panda's dataframe --\n",
    "s3   = boto3.resource('s3')\n",
    "obj  = s3.Object(S3_BUCKET, S3_FILE)\n",
    "body = obj.get()['Body']\n",
    "df   = pd.read_csv(body)\n",
    "\n",
    "# -- call profiling function -- \n",
    "df_stats, trainingDataSchema, eventVariables, eventLabels = summary_stats(df)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Create Variables\n",
    "-----\n",
    "\n",
    "<div class=\"alert alert-info\"> 💡 <strong> Create Variables. </strong>\n",
    "\n",
    "The following section will automatically create your modeling input variables and your model scoring variable for you. \n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS']))]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# --- no changes just run this code block ---\n",
    "def create_label(df, FRAUD_LABEL):\n",
    "    \"\"\"\n",
    "    Returns a dictionary for the model labelSchema, by identifying the rare event as fraud / and common as not-fraud \n",
    "    \n",
    "    Arguments:\n",
    "    df          -- input dataframe \n",
    "    FRAUD_LABEL -- the name of the field that contains fraud label  \n",
    "    \n",
    "    Returns:\n",
    "    labelSchema -- a dictionary containing labelKey & labelMapper \n",
    "    \"\"\"\n",
    "    label_summary = df[FRAUD_LABEL].value_counts()\n",
    "    labelSchema = {'labelKey': FRAUD_LABEL,\n",
    "                   \"labelMapper\" : { \"FRAUD\": [str(label_summary.idxmin())], \n",
    "                                     \"LEGIT\": [str(label_summary.idxmax())]}\n",
    "                  }\n",
    "    client.put_label(\n",
    "                name = str(label_summary.idxmin()),\n",
    "                description = 'FRAUD')\n",
    "    \n",
    "    client.put_label(\n",
    "                name = str(label_summary.idxmax()),\n",
    "                description = 'LEGIT')\n",
    "    return labelSchema\n",
    "    \n",
    "# -- function to create all your variables --- \n",
    "def create_variables(df_stats, MODEL_NAME):\n",
    "    \"\"\"\n",
    "    Returns a variable list of model input variables, checks to see if variable exists,\n",
    "    and, if not, then it adds the variable to Fraud Detector \n",
    "    \n",
    "    Arguments: \n",
    "    enrichment_features  -- dictionary of optional features, mapped to specific variable types enriched (CARD_BIN, USERAGENT)\n",
    "    numeric_features     -- optional list of numeric field names \n",
    "    categorical_features -- optional list of categorical features \n",
    "    \n",
    "    Returns:\n",
    "    variable_list -- a list of variable dictionaries \n",
    "    \n",
    "    \"\"\"\n",
    "    enrichment_features = df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS']))].to_dict(orient=\"record\")\n",
    "    numeric_features = df_stats.loc[(df_stats['feature_type'].isin(['NUMERIC']))]['feature_name'].to_dict()\n",
    "    categorical_features = df_stats.loc[(df_stats['feature_type'].isin(['CATEGORY']))]['feature_name'].to_dict()\n",
    "    \n",
    "    variable_list = []\n",
    "    # -- first do the enrichment features\n",
    "    for feature in enrichment_features: \n",
    "        variable_list.append( {'name' : feature['feature_name']})\n",
    "        try:\n",
    "            resp = client.get_variables(name=feature['feature_name'])\n",
    "        except:\n",
    "            print(\"Creating variable: {0}\".format(feature['feature_name']))\n",
    "            resp = client.create_variable(\n",
    "                    name = feature['feature_name'],\n",
    "                    dataType = 'STRING',\n",
    "                    dataSource ='EVENT',\n",
    "                    defaultValue = '<unknown>', \n",
    "                    description = feature['feature_name'],\n",
    "                    variableType = feature['feature_type'] )\n",
    "                \n",
    "               \n",
    "    # -- check and update the numeric features \n",
    "    for feature in numeric_features: \n",
    "        variable_list.append( {'name' : numeric_features[feature]})\n",
    "        try:\n",
    "            resp = client.get_variables(name=numeric_features[feature])\n",
    "        except:\n",
    "            print(\"Creating variable: {0}\".format(numeric_features[feature]))\n",
    "            resp = client.create_variable(\n",
    "                    name = numeric_features[feature],\n",
    "                    dataType = 'FLOAT',\n",
    "                    dataSource ='EVENT',\n",
    "                    defaultValue = '0.0', \n",
    "                    description = numeric_features[feature],\n",
    "                    variableType = 'NUMERIC' )\n",
    "             \n",
    "    # -- check and update the categorical features \n",
    "    for feature in categorical_features: \n",
    "        variable_list.append( {'name' : categorical_features[feature]})\n",
    "        try:\n",
    "            resp = client.get_variables(name=categorical_features[feature])\n",
    "        except:\n",
    "            print(\"Creating variable: {0}\".format(categorical_features[feature]))\n",
    "            resp = client.create_variable(\n",
    "                    name = categorical_features[feature],\n",
    "                    dataType = 'STRING',\n",
    "                    dataSource ='EVENT',\n",
    "                    defaultValue = '<unknown>', \n",
    "                    description = categorical_features[feature],\n",
    "                    variableType = 'CATEGORICAL' )\n",
    "    \n",
    "    # -- create a model score feature  \n",
    "    model_feature = \"{0}_insightscore\".format(MODEL_NAME)  \n",
    "    # variable_list.append( {'name' : model_feature})\n",
    "    try:\n",
    "        resp = client.get_variables(name=model_feature)\n",
    "    except:\n",
    "        print(\"Creating variable: {0}\".format(model_feature))\n",
    "        resp = client.create_variable(\n",
    "                name = model_feature,\n",
    "                dataType = 'FLOAT',\n",
    "                dataSource ='MODEL_SCORE',\n",
    "                defaultValue = '0.0', \n",
    "                description = model_feature,\n",
    "                variableType = 'NUMERIC' )\n",
    "    \n",
    "    return variable_list\n",
    "\n",
    "\n",
    "model_variables = create_variables(df_stats, MODEL_NAME)\n",
    "print(\"\\n --- model variable dict --\")\n",
    "print(model_variables)\n",
    "\n",
    "\n",
    "model_label = create_label(df, \"EVENT_LABEL\")\n",
    "print(\"\\n --- model label schema dict --\")\n",
    "print(model_label)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Create Entity and Event Types\n",
    "-----\n",
    "    \n",
    "<div class=\"alert alert-info\"> 💡 <strong> Entity and Event. </strong>\n",
    "    \n",
    "The following code block will automatically create your entity and event types for you.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eventLabels\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# --- no changes just run this code block ---\n",
    "response = client.put_entity_type(\n",
    "    name        = ENTITY_TYPE,\n",
    "    description = ENTITY_DESC\n",
    ")\n",
    "print(\"-- create entity --\")\n",
    "print(response)\n",
    "\n",
    "\n",
    "response = client.put_event_type (\n",
    "    name           = EVENT_TYPE,\n",
    "    eventVariables = eventVariables,\n",
    "    labels         = eventLabels,\n",
    "    entityTypes    = [ENTITY_TYPE])\n",
    "print(\"-- create event type --\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Create & Train your Model\n",
    "-----\n",
    "    \n",
    "<div class=\"alert alert-info\"> 💡 <strong> Train Model. </strong>\n",
    "\n",
    "The following section will automatically train and activate your model for you. \n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# --- no changes; just run this code block. ---\n",
    "\n",
    "# -- create our model --\n",
    "response = client.create_model(\n",
    "   description   =  MODEL_DESC,\n",
    "   eventTypeName = EVENT_TYPE,\n",
    "   modelId       = MODEL_NAME,\n",
    "   modelType   = 'ONLINE_FRAUD_INSIGHTS')\n",
    "\n",
    "print(\"-- initalize model --\")\n",
    "print(response)\n",
    "# -- initializes the model, it's now ready to train -- \n",
    "response = client.create_model_version(\n",
    "    modelId     = MODEL_NAME,\n",
    "    modelType   = 'ONLINE_FRAUD_INSIGHTS',\n",
    "    trainingDataSource = 'EXTERNAL_EVENTS',\n",
    "    trainingDataSchema = trainingDataSchema,\n",
    "    externalEventsDetail = {\n",
    "        'dataLocation'     : S3_FILE_LOC,\n",
    "        'dataAccessRoleArn': ARN_ROLE\n",
    "    }\n",
    ")\n",
    "print(\"-- model training --\")\n",
    "print(response)\n",
    "\n",
    "\n",
    "# -- model training takes time, we'll loop until it's complete  -- \n",
    "print(\"-- wait for model training to complete --\")\n",
    "stime = time.time()\n",
    "while True:\n",
    "    clear_output(wait=True)\n",
    "    response = client.get_model_version(modelId=MODEL_NAME, modelType = \"ONLINE_FRAUD_INSIGHTS\", modelVersionNumber = '1.0')\n",
    "    if response['status'] == 'TRAINING_IN_PROGRESS':\n",
    "        print(f\"current progress: {(time.time() - stime)/60:{3}.{3}} minutes\")\n",
    "        time.sleep(60)  # -- sleep for 60 seconds \n",
    "    if response['status'] != 'TRAINING_IN_PROGRESS':\n",
    "        print(\"Model status : \" +  response['status'])\n",
    "        break\n",
    "        \n",
    "etime = time.time()\n",
    "\n",
    "# -- summarize -- \n",
    "print(\"\\n --- model training complete  --\")\n",
    "print(\"Elapsed time : %s\" % (etime - stime) + \" seconds \\n\"  )\n",
    "print(response)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = client.update_model_version_status (\n",
    "    modelId = MODEL_NAME,\n",
    "    modelType = 'ONLINE_FRAUD_INSIGHTS',\n",
    "    modelVersionNumber = '1.0',\n",
    "    status = 'ACTIVE'\n",
    ")\n",
    "print(\"-- activating model --\")\n",
    "print(response)\n",
    "\n",
    "#-- wait until model is active \n",
    "print(\"--- waiting until model status is active \")\n",
    "stime = time.time()\n",
    "while True:\n",
    "    clear_output(wait=True)\n",
    "    response = client.get_model_version(modelId=MODEL_NAME, modelType = \"ONLINE_FRAUD_INSIGHTS\", modelVersionNumber = '1.0')\n",
    "    if response['status'] != 'ACTIVE':\n",
    "        print(f\"current progress: {(time.time() - stime)/60:{3}.{3}} minutes\")\n",
    "        time.sleep(60)  # sleep for 1 minute \n",
    "    if response['status'] == 'ACTIVE':\n",
    "        print(\"Model status : \" +  response['status'])\n",
    "        break\n",
    "        \n",
    "etime = time.time()\n",
    "print(\"Elapsed time : %s\" % (etime - stime) + \" seconds \\n\"  )\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- model performance summary -- \n",
    "auc = client.describe_model_versions(\n",
    "    modelId= MODEL_NAME,\n",
    "    modelVersionNumber='1.0',\n",
    "    modelType='ONLINE_FRAUD_INSIGHTS',\n",
    "    maxResults=10\n",
    ")['modelVersionDetails'][0]['trainingResult']['trainingMetrics']['auc']\n",
    "\n",
    "\n",
    "df_model = pd.DataFrame(client.describe_model_versions(\n",
    "    modelId= MODEL_NAME,\n",
    "    modelVersionNumber='1.0',\n",
    "    modelType='ONLINE_FRAUD_INSIGHTS',\n",
    "    maxResults=10\n",
    ")['modelVersionDetails'][0]['trainingResult']['trainingMetrics']['metricDataPoints'])\n",
    "\n",
    "\n",
    "plt.figure(figsize=(10,10))\n",
    "plt.plot(df_model[\"fpr\"], df_model[\"tpr\"], color='darkorange',\n",
    "         lw=2, label='ROC curve (area = %0.3f)' % auc)\n",
    "plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')\n",
    "plt.xlabel('False Positive Rate')\n",
    "plt.ylabel('True Positive Rate')\n",
    "plt.title( MODEL_NAME + ' ROC Chart')\n",
    "plt.legend(loc=\"lower right\",fontsize=12)\n",
    "plt.axvline(x = 0.02 ,linewidth=2, color='r')\n",
    "plt.axhline(y = 0.73 ,linewidth=2, color='r')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Create Detector, generate Rules and assemble your Detector\n",
    "\n",
    "-----\n",
    "    \n",
    "<div class=\"alert alert-info\"> 💡 <strong> Generate Rules, Create and Publish a Detector. </strong>\n",
    "    \n",
    "The following section will automatically generate a number of fraud, investigate and approve rules based on the false positive rate and score thresholds of your model. These are just example rules that you could create, it is recommended that you fine tune your rules specifically to your business use case.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- initialize your detector -- \n",
    "response = client.put_detector(detectorId  = DETECTOR_NAME, \n",
    "                               description = DETECTOR_DESC, \n",
    "                               eventTypeName = EVENT_TYPE )\n",
    "\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- make rules -- \n",
    "model_stat = df_model.round(decimals=2)  \n",
    "\n",
    "m = model_stat.loc[model_stat.groupby([\"fpr\"])[\"threshold\"].idxmax()] \n",
    "\n",
    "def make_rule(x):\n",
    "    rule = \"\"\n",
    "    if x['fpr'] <= 0.05: \n",
    "        rule = \"${0}_insightscore > {1}\".format(MODEL_NAME,x['threshold'])\n",
    "    if x['fpr'] == 0.06:\n",
    "        rule = \"${0}_insightscore <= {1}\".format(MODEL_NAME,x['threshold_prev'])\n",
    "    return rule\n",
    "    \n",
    "m[\"threshold_prev\"] = m['threshold'].shift(1)\n",
    "m['rule'] = m.apply(lambda x: make_rule(x), axis=1)\n",
    "\n",
    "m['outcome'] = \"approve\"\n",
    "m.loc[m['fpr'] <= 0.03, \"outcome\"] = \"fraud\"\n",
    "m.loc[(m['fpr'] > 0.03) & (m['fpr'] <= 0.05), \"outcome\"] = \"investigate\"\n",
    "\n",
    "print (\" --- score thresholds 1% to 6% --- \")\n",
    "print(m[[\"fpr\", \"tpr\", \"threshold\", \"rule\", \"outcome\"]].loc[(m['fpr'] > 0.0 ) & (m['fpr'] <= 0.06)].reset_index(drop=True))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- create outcomes -- \n",
    "def create_outcomes(outcomes):\n",
    "    \"\"\" create Fraud Detector Outcomes \n",
    "    \n",
    "    \"\"\"   \n",
    "    for outcome in outcomes:\n",
    "        print(\"creating outcome variable: {0} \".format(outcome))\n",
    "        response = client.put_outcome(\n",
    "                          name=outcome,\n",
    "                          description=outcome)\n",
    "\n",
    "# -- get distinct outcomes \n",
    "outcomes = m[\"outcome\"].unique().tolist()\n",
    "\n",
    "create_outcomes(outcomes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rule_set = m[(m[\"fpr\"] > 0.0) & (m[\"fpr\"] <= 0.06)][[\"outcome\", \"rule\"]].to_dict('records')\n",
    "rule_list = []\n",
    "for i, rule in enumerate(rule_set):\n",
    "    ruleId = \"rule{0}_{1}\".format(i, MODEL_NAME)\n",
    "    rule_list.append({\"ruleId\": ruleId, \n",
    "                      \"ruleVersion\" : '1',\n",
    "                      \"detectorId\"  : DETECTOR_NAME\n",
    "        \n",
    "    })\n",
    "    print(\"creating rule: {0}: IF {1} THEN {2}\".format(ruleId, rule[\"rule\"], rule['outcome']))\n",
    "    try:\n",
    "        response = client.create_rule(\n",
    "            ruleId = ruleId,\n",
    "            detectorId = DETECTOR_NAME,\n",
    "            expression = rule['rule'],\n",
    "            language = 'DETECTORPL',\n",
    "            outcomes = [rule['outcome']]\n",
    "            )\n",
    "    except:\n",
    "        print(\"this rule already exists in this detector\")\n",
    "rule_list    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client.create_detector_version(\n",
    "    detectorId = DETECTOR_NAME,\n",
    "    rules = rule_list,\n",
    "    modelVersions = [{\"modelId\":MODEL_NAME, \n",
    "                      \"modelType\" : \"ONLINE_FRAUD_INSIGHTS\",\n",
    "                      \"modelVersionNumber\" : \"1.0\"}],\n",
    "    ruleExecutionMode = 'FIRST_MATCHED'\n",
    "    )\n",
    "\n",
    "print(\"\\n -- detector created -- \")\n",
    "print(response) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = client.update_detector_version_status(\n",
    "    detectorId= DETECTOR_NAME,\n",
    "    detectorVersionId='1',\n",
    "    status='ACTIVE'\n",
    ")\n",
    "print(\"\\n -- detector activated -- \")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7. Make Predictions \n",
    "-----\n",
    "    \n",
    "<div class=\"alert alert-info\"> 💡 <strong> Make Predictions. </strong>\n",
    "    \n",
    "The following section will apply your detector to the first 10 records in your training dataset. To apply your detector to more simply change the record_count, alternatively you can specify the full training data with the following: \n",
    "\n",
    "</div>\n",
    "\n",
    "```python\n",
    "\n",
    "record_count = df.shape()[0]\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- this will apply your detector to the first 10 records of your trainig dataset. -- \n",
    "record_count = 10 \n",
    "predicted_dat = []\n",
    "pred_data = df[eventVariables].head(record_count).astype(str).to_dict(orient='records')\n",
    "for rec in pred_data:\n",
    "    eventId = uuid.uuid1()\n",
    "    pred = client.get_event_prediction(detectorId=DETECTOR_NAME, \n",
    "                                       detectorVersionId='1',\n",
    "                                       eventId = str(eventId),\n",
    "                                       eventTypeName = EVENT_TYPE,\n",
    "                                       eventTimestamp = timestampStr, \n",
    "                                       entities = [{'entityType': ENTITY_TYPE, 'entityId':str(eventId.int)}],\n",
    "                                       eventVariables=rec) \n",
    "    \n",
    "    rec[\"score\"]   = pred['modelScores'][0]['scores'][\"{0}_insightscore\".format(MODEL_NAME)]\n",
    "    rec[\"outcome\"] = pred['ruleResults'][0]['outcomes']\n",
    "    predicted_dat.append(rec)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -- review your predictons -- \n",
    "predictions = pd.DataFrame(predicted_dat)\n",
    "head(predictions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optionally Write Predictions to File\n",
    "\n",
    "<div class=\"alert alert-info\"> 💡 <strong> Write Predictions. </strong>\n",
    "\n",
    "- You can write your prediction dataset to a CSV or Excel to manually review predictions\n",
    "- Simply add a cell below and copy the code below\n",
    "\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "```python\n",
    "\n",
    "# -- optionally write predictions to a CSV file -- \n",
    "predictions.to_csv(MODEL_NAME + \".csv\", index=False)\n",
    "# -- or to a XLS file \n",
    "predictions.to_excel(MODEL_NAME + \".xlsx\", index=False)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_amazonei_mxnet_p36",
   "language": "python",
   "name": "conda_amazonei_mxnet_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
